Geographic disparities and predictors of COVID-19 hospitalization risks in the St. Louis Area, Missouri (USA)

Background There is evidence of geographic disparities in COVID-19 hospitalization risks that, if identified, could guide control efforts. Therefore, the objective of this study was to investigate Zip Code Tabulation Area (ZCTA)-level geographic disparities and identify predictors of COVID-19 hospitalization risks in the St. Louis area. Methods Hospitalization data for COVID-19 and several chronic diseases were obtained from the Missouri Hospital Association. ZCTA-level data on socioeconomic and demographic factors were obtained from the American Community Survey. Geographic disparities in distribution of COVID-19 age-adjusted hospitalization risks, socioeconomic and demographic factors as well as chronic disease risks were investigated using choropleth maps. Predictors of ZCTA-level COVID-19 hospitalization risks were investigated using global negative binomial and local geographically weighted negative binomial models. Results COVID-19 hospitalization risks were significantly higher in ZCTAs with high diabetes hospitalization risks (p < 0.0001), COVID-19 risks (p < 0.0001), black population (p = 0.0416), and populations with some college education (p = 0.0005). The associations between COVID-19 hospitalization risks and the first three predictors varied by geographic location. Conclusions There is evidence of geographic disparities in COVID-19 hospitalization risks that are driven by differences in socioeconomic, demographic and health-related factors. The impacts of these factors vary by geographical location implying that a ‘one-size-fits-all’ approach may not be appropriate for management and control. Using both global and local models leads to a better understanding of geographic disparities. These findings are useful for informing health planning to identify geographic areas likely to have high numbers of individuals needing hospitalization as well as guiding vaccination efforts.

Igoe et al. BMC Public Health (2022) 22:321 of Missouri has reported over 1.1 million cases and 16,500 + deaths. The disease has overwhelmed many United States hospital systems, with large numbers of patients requiring critical care and interventions such as mechanical ventilation [2][3][4]. This surge of COVID-19 patients has put strain on hospital resources, potentially impacting the care available to both COVID-19 and non-COVID-19 patients [5].
There is evidence of geographic disparities in the severity of the disease with certain population groups experiencing more severe disease than others. These disparities might be driven by population characteristics such as socioeconomic, demographic, and chronic disease factors [6][7][8][9][10]. For instance, there is evidence that Non-Hispanic American Indian, non-Hispanic Black, and Hispanic people have higher hospitalization risks than their non-Hispanic Asian and non-Hispanic White counterparts [11].There are also reports that conditions such as diabetes mellitus, obesity, chronic lung conditions, renal disease, cancer, and cardiovascular disease may increase the severity of the condition and risk of hospitalizations among COVID-19 patients with these comorbidities [6,[12][13][14]. Identifying geographic disparities in COVID-19 hospitalization risks and determinants of these disparities is important in providing information to guide hospital preparedness to handle the patient surge and for targeting resources for public health efforts to control the condition at the community level.
Improved understanding of the geographic disparities and predictors of COVID-19 hospitalization risks at the local level, such as the Zip Code Tabulation Areas (ZCTA), would help identify local geographic areas with higher needs for hospital beds and other healthcare resources. This is useful information for healthcare planning and service provision at the local level. It may also inform vaccination efforts by helping to identify areas where higher vaccination coverage may have the largest effect in reducing hospital burden. Therefore, the objective of this study was to identify geographic disparities and predictors of COVID-19 hospitalization risk in the St. Louis Area, Missouri, United States.

Study area
This study was performed in an area of Missouri that includes 108 ZCTAs located in St. Charles county, St. Louis county, and St. Louis City as well as parts of Jefferson, Franklin and Warren counties (Fig. 1). The area had a population of approximately 2 million people comprising 74% white, 20% black, 3% Hispanic/Latino, and 3% Asian. Forty-eight percent of the population was male, while 52% was female. Thirty-one percent of the population was older than 25 years old, 53% was 25-64, and 16% was older than 65 years of age. The ZCTA-level population density varied from 10 people per square mile in St. Charles County to 9,368 people per square mile in St. Louis City County.

Data sources COVID-19 and chronic disease hospitalization data
Data on COVID-19 and chronic condition hospitalizations were obtained from the Hospital Industry Data Institute, a not-for-profit organization founded by the Missouri Hospital Association that summarizes state level discharge data. The COVID-19 data provided information on the numbers of COVID-19 discharges by patient ZCTA and by age group from April 1, 2020 to September 30, 2020. In addition to the COVID-19 hospitalization data, data on the number of confirmed COVID-19 cases were obtained from the county Departments of Health. COVID-19 case data were included as one of the potential predictors of COVID-19 hospitalization risks. Data on chronic conditions were extracted for the time period 2019-2020 based on ICD-10 codes and included the following conditions/behaviors: obesity, tobacco use, cancer (breast, colorectal, prostate, lung, endometrial, leukemia and lymphoma), chronic obstructive pulmonary disease, chronic kidney disease, heart failure, and diabetes. These conditions were selected due to their potential association with COVID-19 severity. All data were aggregated to the ZCTA-level to facilitate subsequent ZCTA-level analyses. The ZCTA-level risks of these conditions were then computed and presented as disease-specific hospitalizations per 100 population.

Socioeconomic, demographic and base map data
ZCTA-level data on socioeconomic and demographic factors including age, sex, race, population density, education level, and median income, were obtained from the U.S. Census Bureau 2018 American Community Survey (ACS) 5-year estimates [15]. The cartographic boundary files, used for generating maps, were obtained from the US Census Bureau Website [16].

Descriptive analyses
All descriptive analyses were performed in R version 4.1.0 [17] using RStudio version 1.4.1103 [18]. Normality of distribution of continuous variables was assessed using Shapiro-Wilk test. Medians as well as 1 st and 3 rd quartiles were used as measures of central tendency and spread for all continuous variables since they were all non-normally distributed. The COVID-19 ZCTA-level hospitalization risks were directly age-standardized in STATA version 16 [19], using the 2010 US population as the standard population.

Global models
Univariable global Poisson models were used to assess simple associations between each of the potential predictors and COVID-19 hospitalization risk. As opposed to local models that estimate as many regression coefficients per predictor as the number of geographic units, global models assume constant relationships between each potential predictor and the outcome and therefore estimate one regression coefficient per potential predictor. These models were fit under the Generalized Linear Model (GLM) framework in R [17] specifying log link. The dependent variable was the expected count of COVID-19 hospitalizations in each ZCTA based on the age-adjusted/standardized number of COVID-19 hospitalizations and the offset was the natural log of the population in each ZCTA. A relaxed α of 0.15 was used to assess potentially significant predictors in univariable models. The linearity of the relationships between the log risk and potential predictor variables was assessed graphically.
Spearman rank correlation coefficients were computed to assess bivariate correlations among the potential predictor variables identified during the univariable analyses using R [17]. Variables with correlation coefficients greater than 0.7 were considered highly correlated. To avoid multicollinearity, only one of a pair of highly correlated variables was retained for further assessments using multivariable Poisson model. Biological and statistical considerations were used to determine which of a pair of highly correlated variables to retain. Non-correlated variables that had univariable p ≤ 0.15 were assessed in a multivariable global Poisson model, built using manual backwards elimination and an α of 0.05. Variables were considered as confounders if their removal from the model resulted Due to the presence of overdispersion in the final Poisson model (based on comparison of model deviance and degrees of freedom), a global negative binomial model was built. The process of building the negative binomial (NB) model was similar to that of the Poisson model described above. However, the glm.nb function of the MASS package [20] of R [17] was used instead of the glm function. The rest of the model specifications were similar to the Poisson model. Goodness-of-fit of the final global NB model was assessed using Pearson and Deviance chi-square goodness-of-fit tests.

Local models
To assess if the association between each of the predictors and hospitalization risks varied by geographic location, a geographically weighted negative binomial (GWNB) model, proposed by Silva and Rodrigues [21], was fit to the data specifying the same outcome, link function, offset and significant predictors as in the final global multivariable NB model. However, unlike the global multivariable NB model that estimates one regression coefficient for each predictor and thus assumes a constant strength of association across all ZCTAs, the GWNB theoretically estimates as many regression coefficients as the number of ZCTAs. Essentially, GWNB evaluates a local model of the outcome by fitting a regression equation to every ZCTA in the dataset. These separate model equations are constructed by incorporating the outcome and predictor variables of the ZCTAs that fall within the neighborhood of each target ZCTA. Therefore, the GWNB allows identification of local variations in the strength of associations and therefore giving the importance of specific predictors in different local areas (ZCTAs). This implies that some factors may be more important predictors of hospitalization risk in some ZCTAs than others. The local GWNB model was fit in SAS version 9.4 [22] using a SAS/IML macro [23]. The estimation of local regression coefficients was based on biquadratic kernel weighting function [23], while the bandwidth was estimated using the adaptive method which allows the size of the bandwidth to vary based on the density of observations. Bias-corrected Akaike Information Criteria (AICc) was used to determine the optimum kernel bandwidth and for comparing the goodness-of-fit of the global NB and GWNB models. The better fitting model was identified as the one with the lower AICc value.
Stationarity of the GWNB coefficients was assessed using: (a) randomization non-stationarity test based on 999 replications [24]; (b) comparison of the interquartile range of the local GWNB model coefficients with the standard error estimates of the global NB model. Local coefficients whose interquartile ranges were larger than twice the standard error of the regression coefficient from the global NB model were considered non-stationary [25,26].

Cartographic displays
Choropleth maps showing the geographic distributions of ZCTA-level age-adjusted COVID-19 hospitalization risks, the socioeconomic, demographic and chronic disease factors as well as local regression coefficients from the GWNB models were generated using QGIS 3.16.6 [27]. Jenk's optimization classification scheme [28][29][30] was used to determine the critical intervals of the choropleth maps.

Descriptive statistics
The ZCTA-level median percentage of males was 48.6% while that of black and Hispanic populations were 3.7% and 2.2%, respectively (Table 1). For education variables, 38.2% of the population had high school education, 23% had some college education while 8.6% had associate's degree and 18% had bachelor's degree. The median household income was just over $59,400 with 9.5% of the ZCTA-level population living below the poverty line. Among chronic conditions investigated, Chronic Kidney Disease had the highest hospitalization risk (7.6%) followed by diabetes (7.5%) and heart failure had the lowest (3.2%) ( Table 1). The ZCTA-level median number of confirmed cases of COVID-19 was 360 cases which was equivalent to 2.2% of the population at ZCTA-level (Table1).

Predictors of COVID-19 hospitalization risks Global model
A number of variables had univariable associations with COVID-19 hospitalization risks at a relaxed p < 0.2 ( Table 2). Of the assessed demographic variables, only percentage of black population had a univariable association (p < 0.001) with hospitalization risk. By contrast, all the assessed educational, economic, health behavior and chronic disease variables had significant univariable associations with COVID-19 hospitalization risks ( Table 2). The ZCTA-level risk of confirmed COVID-19 cases had a significant (p < 0.001) univariable association with COVID-19 hospitalization risk but the raw count of COVID-19 cases per ZCTA did not (p = 0.9) ( Table 2).
Based on the final global multivariable NB model, the ZCTA-level hospitalization risks were higher in ZCTAs that were high in the following predictors: percentage of black population (p = 0.0416), percentage of population with some college education (p = 0.0005), percentage of individuals hospitalized with diabetes (p < 0.0001), and number of ZCTA-level COVID-19 cases per 100 population (p = 0.0001) ( Table 3). A map of the distribution of age-adjusted COVID-19 hospitalization risks and each of the significant predictors is shown in Fig. 2. High ageadjusted COVID-19 hospitalization risks tended to occur in the Northeast of the study area and included ZCTAs in parts of St. Charles, St. Louis and Louis City counties. These areas also had high percentages of black population, individuals with some college education, those with high diabetes hospitalization risks as well as high risks of COVID-19 cases (Fig. 2). High hospitalization risks were also evident in the Southwest areas of the study area that included parts of Warren and Franklin counties. It is worth noting that these areas also tended to have high diabetes hospitalization risks and percentages of the population with some college education (Fig. 2). Since the global model did not show evidence of good fit based on both Deviance (p = 0.03) and Pearson (p = 0.01) goodness-of-fit tests, stationarity of the regression coefficients was assessed using a local GWNB model.

Local model
The p-values of the stationarity tests from the GWNB model indicate that the coefficients for the association between COVID-19 hospitalization risks and percentage of black population (p = 0.001) and number of hospitalized patients with diabetes per 100 population (p = 0.032) were non-stationary (Table 4). Additionally, comparison of the 2 × SEs of the global coefficients and IQR of the local coefficients showed evidence of non-stationarity of the coefficients of the above two predictors as well as the population adjusted cases of COVID-19 (Table 4).
The spatial distribution of the local coefficients of the three predictors whose relationships with COVID-19 hospitalizations were identified as non-stationary provides visual evidence for variability of the local relationships (Fig. 3). Thus, associations of COVID-19 hospitalization risks with percentage of black population, diabetes hospitalization risks and COVID-19 adjusted cases varied considerably across the study area, with a strong East-West gradient. The association between percentage of black population and COVID-19 hospitalization risks, for instance, was positive in the Northeast and negative in the West and Southwest. Moreover, the strength of the association was higher in ZCTAs in the West compared to those near the center of the study area. The strength of association between COVID-19 hospitalization risks with diabetes hospitalization risks was also higher in the Northeast and lower in the West. All ZCTAs, except one (63,025), showed evidence of

Discussion
The goal of this study was to investigate geographic disparities and identify predictors of ZCTA-level COVID-19 hospitalization risks in the St. Louis area. The findings of this study can be used to identify areas where the population is at higher risk of hospitalization due to COVID-19 in order to guide planning and control efforts and to reduce potential overburdening of hospitals during COVID-19 surges. There was evidence of geographic disparities in COVID-19 age-adjusted hospitalization risks in the study area. Urban ZCTAs in St. Louis City and St. Louis County exhibited high hospitalization risks. These ZCTAs also had high percentages of the population that were black, some as high as 98.4%. Some of these ZCTA also had high diabetes hospitalization risks. It is worth noting that some rural ZCTAs in Franklin and Warren counties had high COVID-19 hospitalization risk but very low percentages of the population that were black. However, these ZCTAs had high diabetes hospitalization risks implying that the COVID hospitalization risks in these rural ZCTAs were more driven by diabetes burden than demographic factors. Thus, although COVID-19 hospitalization risks in the more urban areas seem to be driven by the demographic composition of the population, the risks in the more rural areas seem to be driven more by The above findings are consistent with those from previous ecological studies that investigated risk factors and predictors of COVID-19 hospitalization. For instance, an ecological study by Nguyen et al. reported that diabetes was a significant predictor of high COVID-19 hospitalization risk at the county level in Georgia, USA even after controlling for sociodemographic and economic factors [31]. On the contrary, an Iranian study conducted at the provincial level did not find a significant association when only controlling for chronic disease factors [32]. This may indicate differences of relationships in different geographic areas and populations as well as the importance of controlling for sociodemographic factors when evaluating the impact of chronic disease variables on COVID-19 hospitalization risk. Considering the fact that ZCTAs with high diabetes hospitalization risks also tended to have high COVID-19 hospitalization risks, COVID-19 mitigation efforts may need to be targeted to these ZCTAs to reduce the potential burden of the disease.
Interestingly, the study by Nguyen et al., which also considered sociodemographic and economic factors as well as comorbidities, did not find a significant association between percentage of black population and COVID-19 hospitalization risk [31]. However, it did identify other socioeconomic factors that this study did not consider, including: percentage of children in poverty and percentage of the population with severe housing problems. Although not directly comparable with the current ecological study, previous individual level studies have also identified associations between black race and COVID-19 hospitalization risk [8,13,33].
The local GWNB model allowed modeling of geographically varying associations between COVID-19 hospitalization risks and its predictors instead of assuming constant associations across the study area. Although the coefficients of percentage of black population, diabetes hospitalization risk and risk of COVID-19 infections varied spatially, the coefficients of percentage of the population with some college education did not and hence was modeled as stationary. This suggests that the coefficients for the percentage of the population with some college education are generalizable to all ZCTAs in the study area. In contrast, geographic variations in the associations between COVID-19 hospitalization risks and the percentage of black population, diabetes hospitalization risks and risks of COVID-19 infections suggest that global coefficients do not adequately describe the associations between COVID-19 hospitalization risks and these predictors across the study area. These findings have health planning and service provision implications. For instance, a "one size fits all" approach would not be suitable for addressing geographic disparities in COVID-19 hospitalization risks across the study area since some predictors are more important in some locales than others. Thus, different locales may require slightly different strategies depending on the most important predictors driving COVID-19 hospitalization risk in the location. Therefore, planning for hospital capacity and other disease management and control efforts will need to use evidence-based approaches informed by empirical evidence from both global and local models.

Strengths and limitations
This study used both global and local models to investigate geographic disparities and identify predictors of COVID-19 hospitalization risks in St. Louis region of Missouri. Unlike studies that use only global statistical models and no GIS and/or local models, this study has been able to identify how the associations between the outcome (hospitalization risks) and each of the identified significant predictors change based on geographic location. The use of local models to investigate stationarity of regression coefficients of significant predictors and model non-stationary coefficients is a key strength of the study. This approach is particularly important in guiding local health planning since the importance of different predictors are not constant across the study area implying that different management and control strategies may need to be used in different areas. For example, while race (% black) may be a more important driver of hospitalization risks in some locations, other factors (such as diabetes) play more important roles in other areas. This is very useful information for guiding local health planning and disease control/prevention strategies and suggests that a one-size-fits-all strategy might not be the best approach for disease control. Therefore, modeling approaches that use both global and local models help to better understand the relationships between the outcome and predictors and may be more useful in guiding control efforts at the local level. However, the study is not without limitations. The hospital data has limitations associated with diagnostic classifications of COVID-19 in situations when the patient has co-morbidities that may have contributed to hospitalization. The age-standardization of the hospitalization risks was done using the 2010 US census data which was the most recent US census data available at the time of the study. Additionally, there may be geographic differences in COVID-19 case ascertainment and reporting.

Conclusions
There is evidence of geographic disparities in COVID-19 hospitalization risks in the St. Louis area of Missouri. These disparities are driven by socioeconomic, demographic and health-related factors. The impacts of these factors vary by geographical location with some factors being more important predictors of COVID-19 hospitalization risk in some locales than others. This demonstrates the importance of using not only global but also local models to investigate determinants of geographic disparities in health outcomes and utilization of health services. This study's findings are useful for informing healthcare system planning to identify geographic areas likely to have high numbers of individuals needing hospitalization as well as in guiding vaccination efforts.